Computing exact P-values for DNA motifs

نویسندگان

  • Jing Zhang
  • Bo Jiang
  • Ming Li
  • John Tromp
  • Xuegong Zhang
  • Michael Q. Zhang
چکیده

MOTIVATION Many heuristic algorithms have been designed to approximate P-values of DNA motifs described by position weight matrices, for evaluating their statistical significance. They often significantly deviate from the true P-value by orders of magnitude. Exact P-value computation is needed for ranking the motifs. Furthermore, surprisingly, the complexity of the problem is unknown. RESULTS We show the problem to be NP-hard, and present MotifRank, software based on dynamic programming, to calculate exact P-values of motifs. We define the exact P-value on a general and more precise model. Asymptotically, MotifRank is faster than the best exact P-value computing algorithm, and is in fact practical. Our experiments clearly demonstrate that MotifRank significantly improves the accuracy of existing approximation algorithms. AVAILABILITY MotifRank is available from http://bio.dlg.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing exact p - values for DNA motifs ( Part I )

Motivation: Many heuristic algorithms have been designed to approximate p-values of DNA motifs described by position weight matrices, for evaluating their statistical significance. They often significantly deviate from the true p-value by orders of magnitude. Exact p-value computation is needed for ranking the motifs. Furthermore, surprisingly, the complexity of the

متن کامل

A bi-level linear programming problem for computing the nadir point in MOLP

Computing the exact ideal and nadir criterion values is a very ‎important subject in ‎multi-‎objective linear programming (MOLP) ‎problems‎‎. In fact‎, ‎these values define the ideal and nadir points as lower and ‎upper bounds on the nondominated points‎. ‎Whereas determining the ‎ideal point is an easy work‎, ‎because it is equivalent to optimize a ‎convex function (linear function) over a con...

متن کامل

Computing Exact p-Value for Structured Motif

Extracting motifs from a set of DNA sequences is important in computational biology. Occurrence probability is a common used statistics to evaluate the statistical significance of a motif. A main problem is how to calculate the occurrence probability of the motif on the random model of DNA sequence efficiently and accurately. In this paper, we are interested in a particular motif model which is...

متن کامل

P-value-based regulatory motif discovery using positional weight matrices.

To analyze gene regulatory networks, the sequence-dependent DNA/RNA binding affinities of proteins and noncoding RNAs are crucial. Often, these are deduced from sets of sequences enriched in factor binding sites. Two classes of computational approaches exist. The first describe binding motifs by sequence patterns and search the patterns with highest statistical significance for enrichment. The ...

متن کامل

Efficient representation and P-value computation for high-order Markov motifs

MOTIVATION Position weight matrices (PWMs) have become a standard for representing biological sequence motifs. Their relative simplicity has favoured the development of efficient algorithms for diverse tasks such as motif identification, sequence scanning and statistical significance evaluation. Markov chainbased models generalize the PWM model by allowing for interposition dependencies to be c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 23 5  شماره 

صفحات  -

تاریخ انتشار 2007